Capstone Project - The Battle of the Neighborhoods (Week 2)

Applied Data Science Capstone by IBM/Coursera

Introduction: Business Problem

In this project we will try to find an optimal location for a Yoga Center. This report is essentially targetted for stake holders interested in starting a new Yoga Center in Bangalore ,India

Bangalore also known as the Silicon Valley of India is home to millions of Indians from all over india, working in various IT companies. A booming cosmopolitan population has meant huge growth potential in tertiary service industry. Natuarally in recent times franchise model of businesses have taken hold in many parts of Bangalore. Same holds true for Health and Fitness sector as well.

With more and more people becoming fitness aware, has meant the cut throat competition . Thus Businesses have to look into niche offerings like Yoga , Zumba etc. A dedicated Yoga Center is quite a lucrative offering since it addresses mind and body fitness. Also the target audience larger, since Yoga can be taken up by people of all age groups compared to a regular Gym or Zumba center which may be only target population between 20-40.

While a Gym may not be in direct competition of a Yoga center,we will try to find locations which does not have any fitness center in the vicinity. Also we will seek out locations which are in close vicinity of residential, offices, colleges and commercial centers.

We will utilize tools and techniques in data science to generate a few most favorable areas based on the above criteria.

Data

Based on definition of our problem, factors that will influence our decision are

  • Number of existing Gyms in the neighborhood
  • Number of and distance to Yoga Centers in the neighborhood.
  • Number of Homes,Offices,Colleges,Commercial activity in the neighborhood.

Bangalore has Several suburbs and stellite locations which may not be ideal 0 to start new business. So first we will determine the effective subset of Bangalore for our analysis.

Then we will define closely packed grids of radius around 500 metres for our analysis

We will use the following data sources to generate the required information:

  • The suburbs in Bangalore : We will use information in wikipedia to get the list of suburbs in Bangalore.
  • Then we will determine an effective radius which covers maximum number of suburbs.
  • Then we generate the grid centres/locations and determine the their approximate addresses using the Opencage API
  • In general Spatial coordinates and reverse geocoding will be performed using Opencage API.
  • Number of Gyms,Yoga centers and amenities viz homes,offices,colleges,commercial centers in every location will be obtained using Foursquare API.
  • Also we will need sveral Python Libraries like, Pandas, Numpy, Folium to be used for our Analysis.

Lets first Install and import the required Libraries

In [14]:
!pip install opencage
!pip install shapely
!pip install pyproj
!pip install folium

import pandas as pd
import numpy as np 
import folium

import pyproj
import math
import pickle


from sklearn.cluster import KMeans
import requests
from pandas.io.json import json_normalize
Collecting opencage
  Downloading https://files.pythonhosted.org/packages/6d/f2/ed48d7e2fbd06f0ac8dbd511fecc233b68b523daccaae9fb1e6e56b240d4/opencage-1.2-py3-none-any.whl
Requirement already satisfied: six>=1.4.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from opencage) (1.12.0)
Requirement already satisfied: Requests>=2.2.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from opencage) (2.21.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from Requests>=2.2.0->opencage) (2020.4.5.1)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from Requests>=2.2.0->opencage) (1.24.1)
Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from Requests>=2.2.0->opencage) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from Requests>=2.2.0->opencage) (3.0.4)
Installing collected packages: opencage
Successfully installed opencage-1.2
Collecting shapely
  Downloading https://files.pythonhosted.org/packages/20/fa/c96d3461fda99ed8e82ff0b219ac2c8384694b4e640a611a1a8390ecd415/Shapely-1.7.0-cp36-cp36m-manylinux1_x86_64.whl (1.8MB)
     |████████████████████████████████| 1.8MB 7.0MB/s eta 0:00:01
Installing collected packages: shapely
Successfully installed shapely-1.7.0
Collecting pyproj
  Downloading https://files.pythonhosted.org/packages/ce/37/705ee471f71130d4ceee41bbcb06f3b52175cb89273cbb5755ed5e6374e0/pyproj-2.6.0-cp36-cp36m-manylinux2010_x86_64.whl (10.4MB)
     |████████████████████████████████| 10.4MB 7.2MB/s eta 0:00:01
Installing collected packages: pyproj
Successfully installed pyproj-2.6.0
Collecting folium
  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
     |████████████████████████████████| 92kB 13.5MB/s eta 0:00:01
Requirement already satisfied: requests in /opt/conda/envs/Python36/lib/python3.6/site-packages (from folium) (2.21.0)
Requirement already satisfied: jinja2>=2.9 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from folium) (2.10)
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Requirement already satisfied: numpy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from folium) (1.15.4)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from requests->folium) (2020.4.5.1)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from requests->folium) (1.24.1)
Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from requests->folium) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from requests->folium) (3.0.4)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from jinja2>=2.9->folium) (1.1.0)
Requirement already satisfied: six in /opt/conda/envs/Python36/lib/python3.6/site-packages (from branca>=0.3.0->folium) (1.12.0)
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1
In [3]:
from IPython.display import HTML
import base64

def create_download_link( df, title = "Download CSV file", filename = "data.csv"):  
    csv = df.to_csv()
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)

Lets get the list of suburbs in Bangalore

In [18]:
web_data = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore',header=0)

frames = []
 
for i in range(0,8):
    frames.append(web_data[i])

bangalore = pd.concat(frames)
bangalore.drop(['Image','Summary'],axis=1,inplace = True)
bangalore.reset_index(drop=True,inplace=True)
bangalore["Latitude"] = np.zeros(len(bangalore),dtype = float)
bangalore["Longitude"] = np.zeros(len(bangalore),dtype = float)
    

bangalore.head()
Out[18]:
Name Latitude Longitude
0 Cantonment area 0.0 0.0
1 Domlur 0.0 0.0
2 Indiranagar 0.0 0.0
3 Jeevanbheemanagar 0.0 0.0
4 Malleswaram 0.0 0.0

Now using the OpenCage API we determine spatial coordinates of the suburbs

In [21]:
# determine the Spatial coordinates of Bangalore 
from opencage.geocoder import OpenCageGeocode
from pprint import pprint
import numpy as np
blr_lat = 0.0
blr_long = 0.0

geocoder = OpenCageGeocode(GEO_KEY)
query = u'Bangalore,Karnataka,India'
results = geocoder.geocode(query)
blr_lat = results[0]['geometry']['lat']
blr_long = results[0]['geometry']['lng']
#print(results[0]['geometry']['lat'],results[0]['geometry']['lng'])
print("Coordinates of Bangalore ",blr_lat,blr_long)
blr_center = [blr_lat,blr_long]
   
Coordinates of Bangalore  12.9791198 77.5912997
In [22]:
for i in bangalore.index:
    query = u''+str(bangalore['Name'][i])+' Bangalore,Karnataka,India'
    results = geocoder.geocode(query)
    bangalore.at[i,'Latitude'] = results[0]['geometry']['lat']
    bangalore.at[i,'Longitude']= results[0]['geometry']['lng']

bangalore.head()




    
Out[22]:
Name Latitude Longitude
0 Cantonment area 13.019567 77.509589
1 Domlur 12.962467 77.638196
2 Indiranagar 12.973291 77.640467
3 Jeevanbheemanagar 12.971940 77.593690
4 Malleswaram 13.016341 77.558664
In [23]:
bangalore.head()
Out[23]:
Name Latitude Longitude
0 Cantonment area 13.019567 77.509589
1 Domlur 12.962467 77.638196
2 Indiranagar 12.973291 77.640467
3 Jeevanbheemanagar 12.971940 77.593690
4 Malleswaram 13.016341 77.558664
In [24]:
blr_lat = np.mean(bangalore['Latitude'])
blr_long = np.mean(bangalore['Longitude'])
blr_center = [blr_lat,blr_long]

print ('Total number of Localities in Bangalore',len(bangalore))
print('Bangalore center longitude={}, latitude={}'.format(blr_center[1], blr_center[0]))
Total number of Localities in Bangalore 65
Bangalore center longitude=77.59849098615386, latitude=12.972406192307695

Now we proceed to create equally spaced circular grid with centeres 1 KM apart. In effect each circular grid of radius 500m.

We will need to transform the spatial coordinates into Spherical cartesan coordinates, which will allow to calculate distances in meters To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters and not in degrees
So we will create transform functions convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

In [27]:
# Transforamtion Routines : To be used to convert Spatial Coordinates to Catersian Coordinates and Vice-versa

def lonlat_to_xy(lon,lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=43, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy,lon,lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=43, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0],lonlat[1]


def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Bangalore center longitude={}, latitude={}'.format(blr_center[1], blr_center[0]))
x, y = lonlat_to_xy(blr_center[1], blr_center[0])
print('Bangalore center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Bangalore center longitude={}, latitude={}'.format(lo, la))
Coordinate transformation check
-------------------------------
Bangalore center longitude=77.59849098615386, latitude=12.972406192307695
Bangalore center UTM X=781902.4955302086, Y=1435519.9432561896
Bangalore center longitude=77.59849098615388, latitude=12.972406192307693

The below steps are used to identify effective radius of Bangalore to be used for analysis.

  • We first determine the distance of the farthest suburb from the Bangalore centre
  • We determine the average distance of the suburbs from Bangalore Suburb.
  • We determine the media distance of the suburbs from the center.
  • Use each of the above as effective radius we plot the suburbs on the Folium maps to find the smallest effective radius that covers maximum number of suburbs
In [10]:
# Compute distance of the farthest Neighbourhood from the center

blr_center_x, blr_center_y = lonlat_to_xy(blr_center[1], blr_center[0])
Dmax = 0.0 
dist = np.zeros(len(bangalore),dtype =float)

for i in bangalore.index:
    n_x,n_y = lonlat_to_xy(bangalore['Longitude'][i],bangalore['Latitude'][i])
    d   = calc_xy_distance(blr_center_x, blr_center_y, n_x, n_y)
    dist[i] = d
    if d > Dmax:
        Dmax=d
max_dist = int(Dmax)
avg_dist = int(np.mean(dist))
median_dist = int(np.median(dist))

print(avg_dist)
print(median_dist)
print(max_dist)

bangalore['Distance from Center'] = dist
8384
7768
39140
In [11]:
# We map the suburbs and see which radius best brings in maximum number of suburbs 
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
folium.Circle(blr_center,radius = avg_dist,color ='blue', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = median_dist,color ='red', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = max_dist,color ='blue', popup='Bangalore').add_to(map_bangalore)
for lat, lon in zip(bangalore['Latitude'], bangalore['Longitude']):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore) 
map_bangalore
Out[11]:

We observe that a minimum radius of 7.7 Km will still keep lot of suburbs out of recknoning . So as a trial we take a mean of the three distances and see if the computed result is numerically effective

In [12]:
# Seeing above we see while taking max distance brings of lot of areas under Bangalore Rural into our analysis . 
# Lets compute an average of the mean median and max to come to approximate radius that can be used to limit the area of our analysis
blr_radius = round((median_dist+max_dist+avg_dist)/3,-3)

print(blr_radius)
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
folium.Circle(blr_center,radius = avg_dist,color ='blue', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = median_dist,color ='red', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = max_dist,color ='blue', popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = blr_radius,color ='green', popup='Bangalore').add_to(map_bangalore)
for lat, lon in zip(bangalore['Latitude'], bangalore['Longitude']):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore) 
    
map_bangalore
18000.0
Out[12]:

Thus we see a radius of 18000 m or 18Km brings in maximum number of suburbs thus we can exclude the rural areas

So will now limit our analysis within an area of roughly 1017 Square Kms

Now we proceed to create circular cell/grid spaced 1 KM apart .

  • The below algorithm basically divides a larger circle of given radius into smaller circles of given radius.
  • These circular grids or cells will be basic unit of our analysis.
  • For Visualization we shall plot each cell on the Folium maps
  • Using reverse geocoding we will get effective addresses of the location centers
  • We will use FourSquare API to get profile of each location.
In [13]:
# The below algorithm divides the circular area of bangalore into uniform circular grids of 500m radius. 
# Each circle will be the smalleslt unit that will be analysed for Gym,Yoga Centers 

blr_center_x, blr_center_y = lonlat_to_xy(blr_center[1], blr_center[0]) # City center in Cartesian coordinates


x_min = blr_center_x - blr_radius
locality_radius = 500
x_step = 2*locality_radius
y_min = blr_center_y - blr_radius 
y_step = 2 * locality_radius
x_max  = blr_center_x + blr_radius
y_max  = blr_center_x + blr_radius
n_max  = int((2*blr_radius/x_step))

print(y_min,x_min)

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0,n_max+1):
    y = y_min +  i*y_step
    for j in range(0,n_max+1):
        x = x_min +  j*x_step 
        distance_from_center = calc_xy_distance(blr_center_x, blr_center_y, x, y)
        if (distance_from_center < 18001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')
1417519.9432561896 763902.4955302086
1009 candidate neighborhood centers generated.
In [14]:
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
folium.Marker(blr_center, popup='Bangalore').add_to(map_bangalore)
folium.Circle(blr_center,radius = blr_radius,color ='red', popup='Bangalore').add_to(map_bangalore)
#for lat, lon in zip(bangalore['Latitude'], bangalore['Longitude']):
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore) 
    folium.Circle([lat, lon],radius=locality_radius,color='blue', fill=False).add_to(map_bangalore)
    #folium.Marker([lat, lon]).add_to(map_bangalore)
map_bangalore
Out[14]:

The cells cover effectively the cover the area of Bangalore under study

Now we use the OpenCage API for reverse Geocoding to get effectives addresses of each Cell

In [32]:
from opencage.geocoder import InvalidInputError, RateLimitExceededError, UnknownError

def get_locality_address(api_key,lat,long):
    key = api_key
    try:
        results = geocoder.reverse_geocode(lat,long)
        if results and len(results):
           return (results[0]['formatted'])
    except RateLimitExceededError as ex:
           return None 

addr = get_locality_address(GEO_KEY,blr_center[0],blr_center[1]) 
print(addr)
Shantala Nagar, Bengaluru - 560001PH, Karnataka, India
In [16]:
bangalore_localities = []
for lat, lon in zip(latitudes, longitudes):
    local_addr = get_locality_address(GEO_KEY, lat, lon)
    if local_addr is None:
        local_addr = 'NO ADDRESS'
    local_addr = local_addr.replace(', India', '') # We don't need country part of address
    bangalore_localities.append(local_addr)
In [27]:
bangalore_localities[100:110]
Out[27]:
['Ganakkal, Sompura - 560062, Karnataka',
 'Vajrahalli, Sompura - 560062, Karnataka',
 'Hemmigepura, Bengaluru - 560062, Karnataka',
 'Vajrahalli, Gubbalala - 560062, Karnataka',
 'Anjanapura, Bengaluru - 560062, Karnataka',
 'Anjanapura, Bengaluru - 560062, Karnataka',
 'Anjanapura, Bengaluru - 560083, Karnataka',
 'Gottigere Ward, Bengaluru - 560083, Karnataka',
 'Gottigere Ward, Bengaluru - 560083, Karnataka',
 '1st Main, Ramanashree Nagar, Begur, Bengaluru - 560083, Karnataka']
In [18]:
# Now we create a composite dataframe having all the localities of Bangalore with addresses, coordinates, distance from center 
blr_locations = pd.DataFrame({'Address': bangalore_localities,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

blr_locations.head()
Out[18]:
Address Latitude Longitude X Y Distance from center
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769

Foursquare

We use the Foursquare API to get information on GYMS/Fitness centers in each location. We will also determine the Yoga Centers in each location. Besides in general we shall fetch the number of Offices, Residential Locations, Colleges,Commercial Activity in each cell as well in entire Bangalore.Collectively these will be referred to as amenities. This will help us in accessing a cells profile.

  • From the number of amenities we could infer the presence of community i.e presence of a customer base.
  • This will aslo help us eliminate areas which are remote and sparesly populated.
In [19]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']    
In [20]:
def get_local_feature(lat,lon,category,rradius):
    url =('https://api.foursquare.com/v2/venues/explore?client_id={}'
          '&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}').format(CLIENT_ID, 
                                                                  CLIENT_SECRET, 
                                                                  VERSION, 
                                                                  lat, 
                                                                  lon,
                                                                  category,  
                                                                  rradius,
                                                                  LIMIT)
        
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        nearby_venues = json_normalize(results)
#       Filter the columns
        filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng','venue.location.distance']
        nearby_venues = nearby_venues.loc[:, filtered_columns]
# Filter the category for each row
        nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis = 1)
# Clean all column names
        nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
#     print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
    except:
        nearby_venues = []
            
    return nearby_venues


     

    
In [36]:
# Getting the Distribution of Offices and Residences in Bangalore 

def get_location_profile(lat,lon,radius,limit,categories):
    
    profile = {}
    
    
    for category in categories:

        url =('https://api.foursquare.com/v2/venues/explore?client_id={}'
               '&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}').format(CLIENT_ID, 
                                                                  CLIENT_SECRET, 
                                                                  VERSION, 
                                                                  lat, 
                                                                  lon,
                                                                  category,  
                                                                  radius,
                                                                  limit)
        try:
        
            results = requests.get(url).json()['response']['groups'][0]['items']
            results = json_normalize(results)
        #       Filter the columns
            filtered_columns = ['venue.id','venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng','venue.location.distance']
            results = results.loc[:, filtered_columns]
# Filter the category for each row
            results['venue.categories'] = results.apply(get_category_type, axis = 1)
# Clean all column names
            results.columns = [col.split(".")[-1] for col in results.columns]
        
            profile[category] = results
        except:
            profile[category] = []
            
    return profile
#        print('{} offices were returned by Foursquare.'.format(results.shape[0]))
            
    
In [34]:
categories =  [OFFICE,HOME,COLLEGE,COMMERCE]
blr_profile = get_location_profile(blr_center[0],blr_center[1],18000,1000,categories)
pr_xs = []
pr_ys = []
for category in categories:
    for i in blr_profile[category].index:
        x,y = lonlat_to_xy(blr_profile[category]['lng'][i],blr_profile[category]['lat'][i])
        pr_xs.append(x)
        pr_ys.append(y)
        
    blr_profile[category]['xs'] = pr_xs
    blr_profile[category]['ys'] = pr_ys
    pr_xs = []
    pr_ys = []
    
        
    
    
In [35]:
blr_profile[OFFICE].head()
Out[35]:
id name categories lat lng distance xs ys
0 4bebb94561aca593885d8500 Toyota India Corporate Office Office 12.971625 77.595967 287 781629.358542 1.435431e+06
1 4e54ceff1f6e7ab6b1b64441 TNT Office 12.965733 77.603708 933 782476.361020 1.434787e+06
2 4f03d75c0e61e937ce874cd8 Madison Platinum Office Office 12.981899 77.601903 1119 782262.170396 1.436574e+06
3 4dc14f03c65b268b41d82b86 ESPN Digital Media (India) Private Limited Office 12.961511 77.599495 1217 782023.807946 1.434315e+06
4 4be13706c1732d7f33785b9a Du Parc Trinity Office 12.972480 77.618951 2219 784123.454505 1.435551e+06
In [24]:
print ('Number of Offices :',len(blr_profile[OFFICE]))
print('Number of Residential locations :',len(blr_profile[HOME]))
print('Number of Centers with Commercial activities:',len(blr_profile[COMMERCE]))
print('Number of Colleges',len(blr_profile[COLLEGE]))
Number of Offices : 100
Number of Residential locations : 52
Number of Centers with Commercial activities: 100
Number of Colleges 41
In [45]:
# Lets go over all the localities and compose a table 
def compute_feature(lats,lons):
    gyms = {}
    yoga_centers = {}
    residences   = {}
    ofcs     = {}
    location_gyms = []
    location_yoga = []
    location_profile = []
    ofc_cnt       = 0
    cats    = [OFFICE,HOME,COMMERCE,COLLEGE]
    counter = 0
    print('Analysizing the area around the localities')
    for lat,lon in zip(lats,lons):
        
       
        
        counter += 1
        
        profile = get_location_profile(lat,lon,500,100,cats)
        
        location_profile.append(profile)
        
        
        v_gyms   = get_local_feature(lat,lon,GYM,550)
        local_gyms = []
        for i in range(0,len(v_gyms)):
            gym_name = v_gyms['name'][i]
            gym_vid  = v_gyms['id'][i]
            gym_cat  = v_gyms['categories'][i]
            gym_lat  = v_gyms['lat'][i]
            gym_lon  = v_gyms['lng'][i]
            gym_dist = v_gyms['distance'][i]
            x,y      = lonlat_to_xy(gym_lon, gym_lat)
            
            gym_d    = (gym_vid,gym_name,gym_cat,gym_lat,gym_lon,gym_dist,x,y)
            if gym_dist <= 500:
                local_gyms.append(gym_d)
                
            gyms[gym_vid] = gym_d
            
        location_gyms.append(local_gyms)    
            
        v_yoga   = get_local_feature(lat,lon,YOGA,550)
        local_ygc  = []
        for j in range(0,len(v_yoga)):
            yg_vid  = v_yoga['id'][j]
            yg_name = v_yoga['name'][j]
            yg_cat  = v_yoga['categories'][j]
            yg_lat  = v_yoga['lat'][j]
            yg_lon  = v_yoga['lng'][j]
            yg_dist = v_yoga['distance'][j]
            x,y      = lonlat_to_xy(yg_lon, yg_lat)
            yog_d     = (yg_vid,yg_name,yg_cat,yg_lat,yg_lon,yg_dist,x,y)
        
            if yg_dist <= 500:
                local_ygc.append(yog_d)
            yoga_centers[yg_vid] = yog_d
                
        location_yoga.append(local_ygc)
        
        if counter%100 == 0:
            print ('Analyzed {} locations'.format(counter))
    
    return gyms,location_gyms,yoga_centers,location_yoga,location_profile        
        
In [50]:
gyms,location_gyms,yoga_centers,location_yoga,location_profile = compute_feature(latitudes,longitudes)
print('Total number of Gyms:', len(gyms))
print('Total number of Yoga Centers:', len(yoga_centers))
Analysizing the area around the localities
Analyzed 100 locations
Analyzed 200 locations
Analyzed 300 locations
Analyzed 400 locations
Analyzed 500 locations
Analyzed 600 locations
Analyzed 700 locations
Analyzed 800 locations
Analyzed 900 locations
Analyzed 1000 locations
Total number of Gyms: 655
Total number of Yoga Centers: 66
In [51]:
local_gym_count = [len(res) for res in location_gyms]
local_ygc_count = [len(yg) for yg in location_yoga ]
#print(len(local_gym_count))
blr_locations['Gyms'] = local_gym_count
blr_locations['YogaCenters'] = local_ygc_count 
In [52]:
blr_locations.head()
Out[52]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0
In [53]:
print (len(location_profile))
1009
In [60]:
#lets add the amenities count each location 
ofc_cnt = []
hm_cnt =[]
shp_cnt = []
col_cnt = []

for i in range (0,len(location_profile)):
    ofc_cnt.append(len(location_profile[i][OFFICE]))
    hm_cnt.append( len(location_profile[i][HOME]))
    shp_cnt.append(len(location_profile[i][COMMERCE]))
    col_cnt.append(len(location_profile[i][COLLEGE]))
In [61]:
blr_locations['Offices'] = ofc_cnt
blr_locations['Residential'] = hm_cnt
blr_locations['Commercial']  = shp_cnt
blr_locations['Colleges']   = col_cnt
blr_locations.head()
Out[61]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0 0 0 0 0
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0
In [72]:
print (location_profile[0][COLLEGE]['lng'])
0    77.594948
Name: lng, dtype: float64
In [79]:
categories_ = [OFFICE,HOME,COMMERCE,COLLEGE]
map_bangalore = folium.Map(location=blr_center, zoom_start=11)
folium.Marker(blr_center, popup='Bangalore').add_to(map_bangalore)
for gym in gyms.values():
    lat = gym[3]
    lon = gym[4]
    folium.CircleMarker([lat, lon], radius=3, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
for yog in yoga_centers.values():
    lat = yog[3]
    lon = yog[4]
    folium.CircleMarker([lat, lon], radius=3, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_bangalore)



        
#folium.GeoJson(blr_local,style_function=bbmp,name='geojson').add_to(map_bangalore)

map_bangalore
Out[79]:

The above map shows the distribution of Gyms and Yoga centers in Bangalore

In [81]:
for cat in categories_:
    for lat,lon in zip(blr_profile[cat]['lat'],blr_profile[cat]['lng']):
          folium.CircleMarker([lat, lon], radius=3, color='green', fill=True, fill_color='red', fill_opacity=1).add_to(map_bangalore)

#marker_cluster = folium.MarkerCluster().add_to(map_bangalore)
for i in blr_locations.index:
    for cat in categories_:
        if len(location_profile[i][cat]) > 0: 
            for n in range (0,len(location_profile[i][cat])):
                la = location_profile[i][cat]['lat'][n]
                lo = location_profile[i][cat]['lng'][n]
                
                folium.CircleMarker([la,lo ], radius=3, color='green', fill=True,fill_color='red', fill_opacity=1).add_to(map_bangalore)
map_bangalore
Out[81]:

The above map shows the ditribution of Various Amenities

In [82]:
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
folium.Marker(blr_center, popup='Bangalore').add_to(map_bangalore)
for i in blr_locations.index:
    clr = 'red'
    if blr_locations['Gyms'][i] == 0 and blr_locations['YogaCenters'][i] == 0:
        clr = 'yellow'
        if  blr_locations['Offices'][i] > 0 or blr_locations['Residential'][i] > 0 or blr_locations['Colleges'][i] or blr_locations['Commercial'][i]:
                clr = 'green'
    folium.Circle([blr_locations['Latitude'][i], blr_locations['Longitude'][i]],radius=500,color=clr,fill_color=clr, fill=True, fill_opacity=0.3).add_to(map_bangalore)
for cat in categories_:
    for lat,lon in zip(blr_profile[cat]['lat'],blr_profile[cat]['lng']):
          folium.CircleMarker([lat, lon], radius=3, color='green', fill=True, fill_color='red', fill_opacity=1).add_to(map_bangalore)
map_bangalore  
Out[82]:

So now we have a view of all Gyms and Yoga Centers in the City. Also we have an overall profile of locations which have the amenities of our choice.

The above folium map denotes areas in red which already have a gym or a yoga center. The areas marked in green dont have either but have at least one of the amenities of our choice, thus would be investigated further The areas marked in yellow are ones without any fitness center but dont have any amenities either. So could be ignored. But we still investigate firther as some some pockets of these zones may still be in close vicinity of an amentity.

This concludes the data gathering phase and we shall use the above to

Methodology

In this project we will direct our efforts on detecting areas of Bangalore do not have a Gym or Yoga Center. Also we would try to look for locations which have offices, colleges , residential areas and commercial centers either in the location or in close vicinity.

First we will identify areas lacking a Gym or Yoga Centers but having the aforesaid amentities. These locations could be taken to consideration right away.

Next are the location which do not have a Gym or Yoga Center but do not have the aforesaid amenities. But such location having an amenity center with in 4 KM of its center could be considered as potential location.

.

Analysis

In [46]:
create_download_link(blr_locations, filename="blr_locations.csv")
In [83]:
blr_locations.head()
Out[83]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0 0 0 0 0
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0

Lets now refine our raw data to gain some further insights. While we know there are many locations which do not have a Gym or Yoga center. So first lets compute distances of nearest Yoga Center for each location . Likewise we shall compute nearest Gyms and amentities for each location. This will help up further narrow down the areas of interest .

In [84]:
#blr_locations.drop('Distance to Italian restaurant',1,inplace = True)
distances_to_yoga_center = []

for area_x, area_y in zip(xs, ys):
    min_distance = 20000
    for res in yoga_centers.values():
       
        res_x = res[6]
        res_y = res[7]
        

        d = calc_xy_distance(area_x, area_y, res_x, res_y)

        if d<min_distance:
            min_distance = d
    distances_to_yoga_center.append(min_distance)

blr_locations['Distance to Yoga Center'] = distances_to_yoga_center

blr_locations.head()
Out[84]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges Distance to Yoga Center
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1 10415.087317
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0 0 0 0 0 5496.632531
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0 6381.578727
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0 7296.308993
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0 8230.899270
In [85]:
print('Average distance to closest Yoga Center from each area center:', blr_locations['Distance to Yoga Center'].mean())
Average distance to closest Yoga Center from each area center: 4378.063135575886
In [86]:
distances_to_gym = []

for area_x, area_y in zip(xs, ys):
    min_distance = 20000
    for res in gyms.values():
       
        res_x = res[6]
        res_y = res[7]
        

        d = calc_xy_distance(area_x, area_y, res_x, res_y)

        if d<min_distance:
            min_distance = d
    distances_to_gym.append(min_distance)

blr_locations['Distance to Gym'] = distances_to_gym

blr_locations.head()
Out[86]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges Distance to Yoga Center Distance to Gym
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1 10415.087317 5062.130636
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0 0 0 0 0 5496.632531 4560.348385
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0 6381.578727 4922.195531
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0 7296.308993 4467.824516
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0 8230.899270 4107.108364
In [87]:
print('Average distance to closest Yoga Center from each area center:', blr_locations['Distance to Gym'].mean())
Average distance to closest Yoga Center from each area center: 2157.908825268211

So there is a Gym every 2.1 KM and Yoga Center every 4.5 KM

In [92]:
len(location_profile)
Out[92]:
1009
In [94]:
categories = [OFFICE,HOME,COMMERCE,COLLEGE]
distance_to_nearest_amenity = []

for area_x, area_y in zip(xs, ys):
    min_distance = 20000
    for i in blr_locations.index:
        for cat in categories:
            if len(location_profile[i][cat]) > 0:
                ax = blr_locations['X'][i]
                ay = blr_locations['Y'][i]
                d = calc_xy_distance(area_x, area_y, ax, ay)
 
                if d < min_distance:
                   min_distance = d
            
    distance_to_nearest_amenity.append(min_distance)

blr_locations['Distance to Amenity'] = distance_to_nearest_amenity
In [108]:
print('Average distance to closest Amenity from each area center:', blr_locations['Distance to Amenity'].mean())
Average distance to closest Amenity from each area center: 1045.929016782276
In [95]:
blr_locations.head()
Out[95]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges Distance to Yoga Center Distance to Gym Distance to Amenity
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1 10415.087317 5062.130636 0.0
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0 0 0 0 0 5496.632531 4560.348385 4000.0
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0 6381.578727 4922.195531 3000.0
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0 7296.308993 4467.824516 2000.0
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0 8230.899270 4107.108364 1000.0
In [97]:
create_download_link(blr_locations,title = "Download CSV file", filename = "blr_locations.csv")
In [98]:
blr_locations.head()
Out[98]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges Distance to Yoga Center Distance to Gym Distance to Amenity
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1 10415.087317 5062.130636 0.0
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0 0 0 0 0 5496.632531 4560.348385 4000.0
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0 6381.578727 4922.195531 3000.0
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0 7296.308993 4467.824516 2000.0
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0 8230.899270 4107.108364 1000.0
In [5]:
blr_locations = df_data_1
blr_locations.head()
Out[5]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges Distance to Yoga Center Distance to Gym Distance to Amenity
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1 10415.087317 5062.130636 0.0
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0 0 0 0 0 5496.632531 4560.348385 4000.0
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0 6381.578727 4922.195531 3000.0
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0 7296.308993 4467.824516 2000.0
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0 8230.899270 4107.108364 1000.0

The probable areas should be ones near Homes,Offices,Colleges,Commercial places.

Lets try to filter out locations that have at least one amenties , But dont have a Gym or Yoga Center.

In [6]:
!pip install -U pandasql
Collecting pandasql
  Downloading https://files.pythonhosted.org/packages/6b/c4/ee4096ffa2eeeca0c749b26f0371bd26aa5c8b611c43de99a4f86d3de0a7/pandasql-0.7.3.tar.gz
Requirement already satisfied, skipping upgrade: numpy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandasql) (1.15.4)
Requirement already satisfied, skipping upgrade: pandas in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandasql) (0.24.1)
Requirement already satisfied, skipping upgrade: sqlalchemy in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandasql) (1.2.18)
Requirement already satisfied, skipping upgrade: pytz>=2011k in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas->pandasql) (2018.9)
Requirement already satisfied, skipping upgrade: python-dateutil>=2.5.0 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from pandas->pandasql) (2.7.5)
Requirement already satisfied, skipping upgrade: six>=1.5 in /opt/conda/envs/Python36/lib/python3.6/site-packages (from python-dateutil>=2.5.0->pandas->pandasql) (1.12.0)
Building wheels for collected packages: pandasql
  Building wheel for pandasql (setup.py) ... done
  Stored in directory: /home/dsxuser/.cache/pip/wheels/53/6c/18/b87a2e5fa8a82e9c026311de56210b8d1c01846e18a9607fc9
Successfully built pandasql
Installing collected packages: pandasql
Successfully installed pandasql-0.7.3
In [7]:
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
In [101]:
blr_locations.head()
Out[101]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges Distance to Yoga Center Distance to Gym Distance to Amenity
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1 10415.087317 5062.130636 0.0
1 Byatarayanadoddi, Anekal 12.819288 77.550878 776902.49553 1.418520e+06 17720.045147 0 0 0 0 0 0 5496.632531 4560.348385 4000.0
2 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0 6381.578727 4922.195531 3000.0
3 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0 7296.308993 4467.824516 2000.0
4 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0 8230.899270 4107.108364 1000.0
In [9]:
areas_of_interest = pd.DataFrame() 
areas_of_interest = pysqldf("SELECT * FROM   blr_locations WHERE YogaCenters = 0 and Gyms = 0 and (Offices > 0 or  Residential > 0 or Commercial > 0  or Colleges > 0);")

print('Total number of Primary Areas:',len(areas_of_interest))
Total number of Primary Areas: 313
In [10]:
areas_of_interest.head()
Out[10]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges Distance to Yoga Center Distance to Gym Distance to Amenity
0 Bangalore Urban, Sakalavara - 560105, Karnataka 12.809804 77.596813 781902.49553 1.417520e+06 18000.000000 0 0 0 0 0 1 10415.087317 5062.130636 0.0
1 Bannerughatta, Anekal 12.818928 77.587700 780902.49553 1.418520e+06 17029.386366 0 0 0 0 0 1 9179.285411 3971.803410 0.0
2 Bangalore Urban, Kalkere - 560105, Karnataka 12.818838 77.596906 781902.49553 1.418520e+06 17000.000000 0 0 0 0 0 1 9423.523741 4084.373335 0.0
3 Kalkere, Anekal 12.827962 77.587793 780902.49553 1.419520e+06 16031.219542 0 0 0 0 1 6 8340.462729 2971.895236 0.0
4 Electronics City Phase 2 (West), Hulimangala -... 12.827320 77.652232 787902.49553 1.419520e+06 17088.007491 0 0 0 0 1 0 9363.172758 1683.635641 0.0
In [11]:
secondary_areas   = pysqldf("select * from blr_locations where YogaCenters = 0 and Gyms = 0 and Offices = 0 and Residential = 0 and Commercial = 0 and Colleges = 0 and [Distance to Amenity] > 0 and [Distance to Amenity] < 4000 and [Distance to Yoga Center] > 5000")
#pysqldf("SELECT * FROM blr_locations WHERE [Gyms in the Area]> 1 ").head()
print('Total Number of Secondary Area of Interest : ',len(secondary_areas))
Total Number of Secondary Area of Interest :  220
In [12]:
secondary_areas.head()
Out[12]:
Address Latitude Longitude X Y Distance from center Gyms YogaCenters Offices Residential Commercial Colleges Distance to Yoga Center Distance to Gym Distance to Amenity
0 Byatarayanadoddi, Anekal 12.819199 77.560084 777902.49553 1.418520e+06 17464.249197 0 0 0 0 0 0 6381.578727 4922.195531 3000.0
1 Gottigere, Bangalore South 12.819109 77.569289 778902.49553 1.418520e+06 17262.676502 0 0 0 0 0 0 7296.308993 4467.824516 2000.0
2 Bannerughatta, Anekal 12.819019 77.578495 779902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0 8230.899270 4107.108364 1000.0
3 Bangalore Urban, Sakalavara - 560105, Karnataka 12.818747 77.606111 782902.49553 1.418520e+06 17029.386366 0 0 0 0 0 0 9610.985212 4391.658311 1000.0
4 Bangalore Urban, Sakalavara - 560105, Karnataka 12.818655 77.615317 783902.49553 1.418520e+06 17117.242769 0 0 0 0 0 0 9896.427324 4103.964013 2000.0
In [25]:
map_bangalore = folium.Map(location=blr_center, zoom_start=10)
for lat, lon in zip(areas_of_interest['Latitude'], areas_of_interest['Longitude']):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore) 
    folium.Circle([lat, lon],radius=500,color='blue', fill=False).add_to(map_bangalore)
    #folium.Marker([lat, lon]).add_to(map_bangalore)
for lat, lon in zip(secondary_areas['Latitude'], secondary_areas['Longitude']):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore) 
    folium.Circle([lat, lon],radius=500,color='yellow', fill=False).add_to(map_bangalore)
    #folium.Marker([lat, lon]).add_to(map_bangalore)
map_bangalore
Out[25]:
In [40]:
from sklearn.cluster import KMeans

number_of_clusters = 15

good_xys = areas_of_interest[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]
In [43]:
map_bangalore = folium.Map(location=blr_center, zoom_start=11)
#folium.TileLayer('cartodbpositron').add_to(map_bangalore)
#HeatMap(ygc_latlons).add_to(map_bangalore)
folium.Circle(blr_center, radius=18000, color='blue', fill=False, fill_opacity=0.4).add_to(map_bangalore)
folium.Marker(blr_center).add_to(map_bangalore)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_bangalore) 
    folium.Marker([lat, lon]).add_to(map_bangalore) 
for lat, lon in zip(areas_of_interest['Latitude'], areas_of_interest['Longitude']):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
#folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_bangalore)
map_bangalore
Out[43]:
In [44]:
blr_center_x, blr_center_y = lonlat_to_xy(blr_center[1],blr_center[0])
candidate_area_addresses = []
print('==============================================================')
print('Addresses of Primary Locations  recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_locality_address(GEO_KEY, lat, lon).replace(', India', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, blr_center_x, blr_center_y)
    print('{}{} => {:.1f}km from Bangalore Center'.format(addr, ' '*(50-len(addr)), d/1000))
==============================================================
Addresses of Primary Locations  recommended for further analysis
==============================================================

Rayapuram Ward, Bengaluru - 560026, Karnataka      => 4.6km from Bangalore Center
K R Puram, Bengaluru - 560 036, Karnataka          => 12.8km from Bangalore Center
Begur, Bengaluru - 560076, Karnataka               => 12.4km from Bangalore Center
4th Cross Road, Ganga Nagar Ward, Bengaluru - 560094, Karnataka => 6.3km from Bangalore Center
BWSSB Pipeline Track, Herohalli, Bengaluru - 560091, Karnataka => 11.1km from Bangalore Center
Kengeri Main Road, Jnana Bharathi, Bengaluru - 560074, Karnataka => 12.1km from Bangalore Center
Jogupalya, Bengaluru - 560007, Karnataka           => 3.2km from Bangalore Center
India International School, Gunjur Varthur Main Road, Varthuru, Bengaluru - 560087, Karnataka => 14.4km from Bangalore Center
Krishna Leela Park, Pipeline Road, Vasanthapura, Bengaluru - 560082, Karnataka => 10.5km from Bangalore Center
1st Cross Road, Horamavu, Bengaluru - 560 077, Karnataka => 9.5km from Bangalore Center
Bellandur Lake, 7th Cross Road, Bellanduru, Bengaluru - 560017, Karnataka => 8.8km from Bangalore Center
Bangalore Elevated Tollways Limited, Begur, Bengaluru - 560 100, Karnataka => 14.7km from Bangalore Center
Umiya Woods, Ecumenical Christian Center Road, Hagadur, Bengaluru - 560066, Karnataka => 15.9km from Bangalore Center
HMT Ward, Bengaluru - 560057, Karnataka            => 11.0km from Bangalore Center
Hemmigepura, Bengaluru - 560074, Karnataka         => 14.7km from Bangalore Center
In [35]:
number_of_clusters = 15

ok_xys = secondary_areas[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(ok_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_bangalore = folium.Map(location=blr_center, zoom_start=12)
#folium.TileLayer('cartodbpositron').add_to(map_bangalore)
#HeatMap(ygc_latlons).add_to(map_bangalore)
folium.Circle(blr_center, radius=18000, fill=False, fill_opacity=0.4).add_to(map_bangalore)
folium.Marker(blr_center).add_to(map_bangalore)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_bangalore) 
    folium.Marker([lat, lon]).add_to(map_bangalore)
for lat, lon in zip(secondary_areas['Latitude'], secondary_areas['Longitude']):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_bangalore)
#folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_bangalore)
map_bangalore
Out[35]:
In [37]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of Secondary Locations recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_locality_address(GEO_KEY, lat, lon).replace(', India', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, blr_center_x, blr_center_y)
    print('{}{} => {:.1f}km from Bangalore Center'.format(addr, ' '*(50-len(addr)), d/1000))
==============================================================
Addresses of Secondary Locations recommended for further analysis
==============================================================

Thanisandra, Bengaluru - 56077, Karnataka          => 11.8km from Bangalore Center
Gottigere, - 560083, Karnataka                     => 15.0km from Bangalore Center
Bangalore Urban, Machohalli - 560018, Karnataka    => 15.3km from Bangalore Center
Bangalore Urban, Kodati - 560035, Karnataka        => 16.5km from Bangalore Center
Bangalore Urban, Avalahalli - 560049, Karnataka    => 16.9km from Bangalore Center
Anchepalya, Chikka Banavara - 560090, Karnataka    => 16.2km from Bangalore Center
Challaghatta, Kommaghatta - 560074, Karnataka      => 16.5km from Bangalore Center
Hudi, Bengaluru - 56077, Karnataka                 => 13.3km from Bangalore Center
Yarandahalli, Hulimangala - 560100, Karnataka      => 16.8km from Bangalore Center
Begur-Koppa Road, Begur, Bengaluru - 560105, Karnataka => 14.4km from Bangalore Center
Dodda Bidarakallu, Bengaluru - 560073, Karnataka   => 14.8km from Bangalore Center
Ullalu Upanagara, Kannalli - 560110, Karnataka     => 15.6km from Bangalore Center
Begur, Bengaluru - 560 100, Karnataka              => 16.0km from Bangalore Center
K. Channasandra, Bidarahalli - 560049, Karnataka   => 15.5km from Bangalore Center
Shettihalli, Bengaluru - 560090, Karnataka         => 13.8km from Bangalore Center

This concludes our analysis. We have created 15 addresses representing centers of zones containing locations with no Gyms or Yoga Centers but still close to residential/office/commercial locations. Although zones are shown on map with a radius of ~500 meters (green circles), their shape is actually very irregular and their centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential location for fitness centers.

Results and Discussion

Our analysis shows that although for a city as populous as bangalore is catered by few hundered Gyms and even fewer Yoga Centers. Thus there is a great potential for new Yoga Center or even a multi speciality fitness center.

Intially we divided the entire area of Bangaore into 500m radius circular cells and analyzed each location . This helped us to arrive with data which showed which locations have fitness centers and which absolutely lack any .

We tried assessing the feasibilty of each location understading the location profile. If a location has offices, homes, commercial centers or college, it indicates primarily that the cell is habitated some economic activity can inferred. Thus such locations without a fitness center should be oour primary area of interest. Now there could be areas which are remote but still with in managebale distance from community center. Such locations are the secondary set of locations that could be considered.

We then used the KMeans Clustering to arrive at 15 potential primary locations and 15 potential secondary locations

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Conclusion

Purpose of the project was to indentify potential locations for starting a new Yoga Center or Fitness Center in Bangalore. Using the Foursquare data we could gain insight about every location, the exact count of gyms or yoga centers , besides we could get an overview of the location profile which helped us to narrow down the potential locations

We then clustered those locations to identify general zones which could used as starting point for further explorations

Final descision on optimal fitness center/ Yoga Center will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location real estate availability, prices, social and economic dynamics of every neighborhood etc.

In [ ]: